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Parametric encoder and method for encoding an audio or speech signal 



13.07.2001 
Ep O - DQ - } 



1 7 - 07. 2001 



The invention relates to a parametric encoder and method for encoding an 
audio or speech signal into sinusoidal code data. 



Such encoders and methods are generally known in the art and are for example 
5 disclosed in B. Edler, H. Purnhagen, and C. Ferekidis "AS AC - Analysis/synthesis codec for 
very low bit rates", Preprint 4179 (F-6) 1 00 th AES Convention, Copenhagen, 11-14 May 
1996. Such a known parametric encoder is illustrated in Figs. 4 and 5. 

According to Fig. 5 the encoder comprises a segmentation unit 120' for 
10 segmenting a received audio or speech signal into at least one single scale segment x m (l) 

having the samples x m (0), x ra (L-l). These samples are received by a sinusoidal estimation 
unit 140\ for estimating sinusoidal code data representing said segment x m (n). These 
sinusoidal code data are typically merged into a data stream before been transmitted via a 
channel or stored on a recording medium. 

15 

Fig. 4 provides an - also known - more detailed illustration of the 
segmentation unit 120'. As can be seen there, the audio or speech signal s(n) is input into a 
tapped delay line comprising consecutive filters 122_1 5 , 122_2', 122JL-T. The original 
audio or speech signal s(n) = yo(nD) as well as the output signals y'i(nD) yL-i(nD) of said 
20 L-l filters 122_1\ ... 122JL-1 ' are input into a sampling unit 124', preferably embodied as 
down sampling unit, in order to generate L samples x m (0), x m (L-l) of the segment x m (l). 



The single scale segments as generated by the known parametric encoder 
according to Figs. 4 and 5 are characterised in that their segment length and consequently 
25 also their frequency resolution is constant independent of the actual frequency range of the 
segmented audio or speech signal. Expressed in other words, the single scale sinusoidal 
estimation mechanism as provide in the common encoders gives problems with the required 
time-frequency resolution trade-off. In particular for low frequency ranges of the signal s for 
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high-quality audio coding high frequency resolution is required, whereas for other frequency 
ranges a lower frequency resolution, i.e. a lower segment length L would be sufficient. 



In order to overcome these problems, multi-scale models have been proposed, 
for example by T.S. Verma S.N. Levine and J.O. Smith III "Multiresolution sinusoidal 
modeling for wideband audio with modifications", in Proc. ICASSP-98, Seattle, 1998. These 
multi-scale models provide different segment length L for different frequency ranges of the 
signal s. However, these multi-scale models bring about problems of scattering of 
components over scales and/or of merging the data retrieved at different scales. More 
specifically, a problem of scattering addresses the problem that the generated segments 
usually overlap and thus, samples of said segments might be processed twice because there is 
no clear separation possible - except of applying high effort - between the samples of two 
generated segments. 

Starting from that prior art it is an object of the invention to improve a known 
parametric encoder and method for encoding an audio or speech signal such that a required 
time-frequency resolution trade-off can be established without having the above mentioned 
problems of the multi-scale models, namely the problem of scattering of components over 
scales and/or of merging the data retrieved at different scales. 

This object is solved by the subject matter of claim 1. More specifically, for 
the known parametric encoder it is suggested according to claim 1, that the segmentation unit 
is further embodied for carrying out a frequency- warping operation in order to transform the 
output samples onto a frequency-warped domain and to provide a post-processing filter for 
re-mapping said sinusoidal code data output from the sinusoidal estimation unit to the 
original frequency domain of the signal s. 

The segmentation unit of the claimed parametric encoder segments the signal s 
into at least one single scale segment x m (l). Because said unit only generates single scale 
segments the problems of the multi-scale models known in the art do not occur here. Instead, 
by applying the frequency- warping operation the required time-frequency resolution trade- 
off, i.e. providing different frequency resolutions for different frequency ranges of the signal 
s, can advantageously be established for single scale segments without any problems. 
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It shall be noted here that unilateral frequency- warping is generally known in 
the art, e.g. for linear predictive coding of audio, audio equalisation and by normal filter 
design, but not for sinusoidal coding as suggested in that application. Bilateral frequency 
warping has not been applied in audio processing. 

5 

Advantageous embodiments of that parametric encoder are mentioned in the 
dependent claims. 

The object is further solved by a method for encoding an audio or speech 
10 signal according to claim 9. The advantages of said method correspond to the advantages 
mentioned above for the parametric encoder. 

Five figures are accompanying the description, wherein 

15 Fig. 1 shows a first preferred embodiment of the parametric encoder according 

to the invention; 

Fig. 2 shows a second preferred embodiment of the parametric encoder 
according to the invention; 

Fig. 3 shows a third preferred embodiment of the parametric encoder 
20 according to the invention; 

Fig. 4 shows a detailed illustration of a parametric encoder known in the art; 

and 

Fig. 5 shows a general block diagram of the parametric encoder known in the 

art. 

25 

In the following the preferred embodiments of the parametric encoder 
according to the invention are described by referring to Figs. 1 to 3. 

Fig. 1 shows a first preferred embodiment of the parametric encoder according 
30 to the invention for encoding an audio or speech signal s(n) into sinusoidal code data scd. It 
comprises a segmentation unit 120 for segmenting said signal s into at least one single scale 
segment x m (n) with m = 1 ... M, where m denotes a current downsampling step. More 
specifically, said segmentation unit 120 comprises a plurality of L-l filters 122_1, 122JL- 
1 being connected in series for receiving the signal s(n) at the input of the first of said filters 
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122_1. Said segmentation unit 120 further comprises a sampling unit 124 for receiving and 
preferably down sampling said signal s(n) = yo(n) as well as the output signals yi(n) yL- 
i(n) of said L-l filters 122_1 5 122_L-1 in order to generate L samples x m (0), x m (L-l) of 
the single scale segment x m (l) with 1 = 0 ...(L-l). In said first embodiment all of the L-l filters 
122_1, 122_L-1 are embodied as all-pass filters having a transfer function A(z) defined 
as: 



A{z)= " , (1) 



X+z - 1 
l-Az- 

10 where * denotes a complex-conjugation and \X\ < 1 . Typically, X is real-valued and X * 0 . 

In that first embodiment the processing is the following: 
The audio signal s is input to a tapped all-pass line having outputs yi(n) (1 = 0,1, L-l) with 

15 

y 0 (n) = s(n), and (2) 
y, = y,-x * « for 1 = 1,2, L-l (3) 
with * denoting convolution and ct the impulse response associated with the transfer 
function A(z). The outputs yi are downsampled (read-out every D time instances) and defined 
20 as a segment x m : 

x m (0 = y i (mD) (4) 

where D is the downsampling factor of the sampling unit 140. The signal output by said 
sampling unit 124 is considered to represent the samples x m (l) with 1 = 0, 1 , L-l of a 
25 segment x m . 

It is important to note that because the filters 122_1, 122_L-1 are - 
according to the first embodiment -embodied as all-pass filters the samples output by the 
sampling unit 124 are on a frequency-warped domain. 

30 

Said samples x m (l) with 1 = 0, L-l are input into a sinusoidal estimation unit 
140 for estimating the sinusoidal code data representing the segment x m . The estimation may 
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be done by carrying out a Fourier transformation on said frequency- warped samples and 
subsequent, for instance, peak picking. 



It is further important to note that the sinusoidal code data as output by said 
5 sinusoidal estimation 140 is on a frequency-warped domain. Consequently, said sinusoidal 
code data has to be re-mapped, i.e. to be de- warped, to the original frequency domain of the 
audio or speech signal s. This is done by a post-processing filter 160 following said 
sinusoidal estimation unit 140. The output of said post-processing filter 160 corresponds to 
the re-mapped sinusoidal code data associated with the original signal segment x m . 

10 

After sinusoidal extraction, as finished by said post-processing filter 160, the 
subsequent processing step is residual modelling. The cheapest way of residual modelling is 
using a parametric model for the power spectral density functions. Such an approach allows 
the integration of sinusoidal and noise estimation since, for noise modelling frequency- 
1 5 warping can be used. 



In the first embodiment the frequency warped samples warped by said 
sampling unit 120 belong to a single scale segment x m with the result that the problems of 
multi-scale models known in the art do not occur here. Due to the embodiment of the filters 
20 as all-pass filters a frequency- warping operation is carried out resulting in the frequency- 
warped samples at the output of the sampling unit 124. Due to the frequency warping 
operation the required time-frequency resolution trade-off is achieved for the signal s. 
However, disadvantageously, the power spectral density function of the original audio or 
speech signal is slightly amended. 

25 

Fig. 2 shows a second embodiment of the parametric encoder which 
substantially corresponds to the first embodiment. In particular, the sampling unit 124, the 
sinusoidal estimation unit 140 and the post-processing filter 160 in the second embodiment 
are identical to the corresponding units in the first embodiment. Moreover, the filters 122_3, 
30 122JL-1 correspond to the respective filters in the first embodiment because they are also 
embodied as first-order all-pass filters having a transfer function A(z) according to equation 
(1). 
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However, the second embodiment differs from the first embodiment in that the 
first filter 122_1 in the series connection of filters in the segmentation unit 120 has a transfer 
function Ao(z) according to: 

Moreover, the second filter 122_2 is also not embodied as all-pass filter but 
has instead a transfer function Ai(z) according to 

^WH^yr^' (6) 

wherein in equations 5 and 6 X is typically real-valued. 

For X > 0 the transfer functions Ao(z) and Ai(z) both represent a low-pass 
filter, whereas for X < 0 both transfer functions represent a high-pass filter. 

The advantages of the second embodiment correspond to the first embodiment. 
Moreover, the shape of the power spectral density function of the original audio or speech 
signal s is better maintained. 

A problem the first and second embodiment is that the introduced frequency 
warping operation acts as a unilateral device. The past is warped and, as a consequence of the 
fact that effectively the time-scale for each frequency is different, the estimated frequencies 
are good estimates for the instantaneous frequencies some n samples ago, where n, 
representing delays of the instantaneous frequencies, is dependent on the instantaneous 
frequencies themselves. Expressed in other words, the presence of the delay as such is 
accepted, but its frequency dependency should be avoided because this frequency 
dependency is disadvantageous for encoding purposes; for encoding purposes an estimate of 
the instantaneous frequencies at a well-defined moment in time is desired. 

To achieve this, it is proposed to extend the frequency-warping procedure to a 
bi-lateral operation, warping both, the past and the future. The latter is not possible with the 
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mechanisms considered in embodiments 1 and 2 since these are based on infinite-impulse 
response IIR-filters. 



5 observing a finite part of the ideally infinitely-long warped signal then the processing using 
IIR-filters reduces to a matrix-vector multiplication. In that case the parametric encoder can 
be embodied according to a third embodiment of the invention as shown in Fig. 3. According 
to that embodiment the received audio or speech signal is input into a tapped delay line and 
subsequently said audio or speech signal s as well as the output signals yi(n) yL-i(n) of the 
10 L-l filters 122_1, 122_L-1 of the tapped delay line are input into a sampling 124 unit for 
generating a segment x m having a number of Ni + 1 + N2 samples being indexed -Ni, -Ni + 1, 
0, N2 -1, N2 with Ni, N 2 > 0. It is important to note that the sampling operation carried 
out so far in the third embodiment corresponds to the sampling operation known in the art as 
described by referring to Fig. 4 and that the samples resulting from that common sampling 

15 operation at the output of the sampling unit x° m (-Ni), x° m (0), x° m (N 2 ) are not yet on a 
frequency-warped domain. 

In order to transform the samples onto the frequency- warped domain a bi- 
lateral warping operation is carried out by an additionally provided bi-lateral warping unit 
20 126, preferably also provided within said sampling unit 120. Said unit carries out the matrix- 
vector multiplication mentioned in the previous paragraph, written in matrix notation: 



operations, in particular it can be calculated such that the frequency- warping operations 
according to embodiment 1 or 2 of the invention are simulated or realised by the third 
embodiment. The samples output by said bi-lateral warping unit 126 are - in contrast to the 
input samples - on the desired frequency-warped domain like the samples output by the 
30 sampling unit 120 according to embodiments 1 or 2. As can be seen from Fig. 3 the 

transformed samples are output to the sinusoidal estimation unit 140 in which the desired 
sinusoidal code data are estimated and finally the sinusoidal code data on the frequency- 
warped domain is output by said estimation unit 140 and input into the post-processing filter 



However, considering the frequency- warping of a finite segment and 



oc> — ■ — j/^^C 

m m 



(7) 



25 



The transformation matrix B can be calculated for different frequency-warping 



10477EPP 



8 



13.07.2001 



160 for being re-mapped to the original frequency domain of the signal s. Subsequently, an 
example for calculating the transformation matrix B is given such that embodiment 2 is 
simulated by embodiment 3 . 



having a finite support is considered. More specifically, the samples of said segment are 
indexed to -Ni, -Ni+1, ... 0, N2 with Ni, N2 > 0. The associated warped signal is denoted 
by x{ri) and has, in principle, an infinite support. 

The Fourier transforms of the sample x(n) and of the associated warped signal 

are given as 



In order to achieve this simulation, frequency- warping of a segment x (n) 



S{e je ) = 2>(*)e 



n 



sV') = X*(«K 



n 



with j = V— 1 . For frequency- warping according to the phase characteristic of an all-pass 
section the following relation between these frequency variables are given: 




(8) 



or 



(9) 



prom this it follows that 



x(n) = -±-\ <2 Jie J *)e»"d<l> 
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Z7T k=m 



<2ir> 



V 



1 + e**X 



e J *d<f> 



(10) 



with the definition of the interpolation function q 



10 



e je +X 
l + e /e X 



(11) 



and F n denoting the inverse Fourier transformation to the n-domain. More specifically, 



15 



q(A;n,0) = 5(ri)\ 

q{X\ — , k) = impulse response of an kth order all-pass, k>0, 

q(X\ n,k) = q{h\—n-K) 

q(Z;n 9 k) = 0, if n • k < 0 or (k = 0 and n * 0), 



20 In matrix notation (omitting X from the notation for this specific case) equation 

(7) can be written as: 



A 
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f : > 

x(0) 
*(1) 

*„,(«) 



= B 



4(°) 

4(1) 



/ : ^ 
x m {-n) 

*(0) 
x(l) 

*„,(") 



9(0,^) 



9(1,1) 

<7(0,1) 1 <K0,1) 
9(1,1) 



q(0,N 2 ) 



q(n,V) ••• q(n,N 2 ) 



*£(-» 
4(0) 
4(D 



(12) 



i.e. column-wise the impulse responses of the cascaded all-pass filters appear. In practice, a 
truncated (windowed) warped signal x will be used for further processing. Assuming that 
the part of x shall consider ranges from 

-Mi to M2 and that Mi « M2 > 0 and Ni « N2. Then, approximately half of the matrix equals 
zero. For positive X, the support of the truncated x will effectively be shorter than that of 
x. 



The rows of the matrix correspond to the (truncated) impulse response of the 
filters described in embodiment 2. 

It should be noted that the above-mentioned embodiments illustrate rather than 
limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
word 'comprising' does not exclude the presence of other elements or steps than those listed 
in a claim. The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably programmed computer. In a device claim 
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enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutually different 
dependent claims does not indicate that a combination of these measures cannot be used to 
advantage. 
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1 . A parametric encoder for encoding an audio or speech signal s into sinusoidal 
code data, comprising: 

a segmentation unit (120) for segmenting said signal s into at least one single 
scale segment x m (n) with m = 1 ...M and for outputting the samples x m (0), x m (L-l) of said 
5 segment x m (n); and 

a sinusoidal estimation unit (140) for estimating the sinusoidal code data 
representing said segment x m (n) from the received samples x m (0), x m (L-l)); 
characterized in that 

the segmentation unit (120) is further embodied for carrying out a frequency- 
10 warping operation in order to transform the output samples x m (0), x m (L~l)) onto a 
frequency- warped domain; and 

a post-processing filter (160) is provided for re-mapping said sinusoidal data 
output from the sinusoidal estimation unit (140) to the original frequency domain of the 
signal s. 

15 

2. The parametric encoder according to claim 1 , characterized in that the 
segmentation unit (120) comprises 

a plurality of L-l filters (122_1, ... 122JL-1) being connected in series for 
receiving the signal s(n) at the input of the first of said filters (122_1); and 
20 a sampling unit (124) for receiving and sampling said signal s(n)=y 0 (n) as well 

as the output signals 

yi(n)...yL-i(n) of said L-l filters (122_1, ... 122JL-1) in order to generate L 
samples x m (0), x m (L-l) or 

x° m (0), (L-l) of the segment x m . 

25 

3. The parametric encoder according to claim 2, characterized in that at least 
some of the filters (122_1 , ... 122JL-1) are embodied as all-pass filters. 



A 
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4. The parametric encoder according to claim 3, characterized in that the some 

filters (122_1, ... 

122JL-1) are embodied as first-order all-pass filters each having a transfer function A(z) 
according to: 



A(z) = 



-X+z- 1 



wherein A* denotes a complex-conjugation and wherein X is preferably real valued. 

5. The parametric encoder according to claim 4, characterized in that all of the 
10 filters (122__1, 122JL-1) out of the plurality of filters are embodied as first-order all-pass 

filter, each having a transfer function A(z) according to: 

ai ^ ~ X+ z '' 

wherein X* denotes a complex-conjugation and wherein X is preferably real valued. 

15 

6. The parametric encoder according to claim 4, characterized in that the first 
filter (122_1) in said series connection receiving the signal s(n) has a transfer function A0(z) 
according to: 

20 the second filter (122_2) in said series connection following said first filter (122_1) has a 
transfer function Al(z) according to: 



A O) = Jl-^l 2 izjp > 311(1 



25 the remaining filters (122_3...122JL-1) each are first order all-pass filters having a transfer 
function A(z) according to claim 4. 



1. 



The parametric encoder according to claim 2, characterized in that 
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in the segmentation unit (120) the plurality of L-l filters (122_1, ... 122JL-1) 
being connected in series is embodied as tapped delay-line with each of the filters having a 
transfer function of A(z) = z" 1 ; and 

there is additionally provided a bi-lateral warping unit (126) for transforming 
the samples on the original frequency-domain of the signal s x° m (-Ni), x° m (N2) output by 
the sampling unit (124) into transformed samples x m (-Mi), x m (M2) on a frequency-warped 
domain by applying a bi-lateral frequency-warping operation to the samples x° m (-Ni), 
x° m (N2) and for outputting the transformed samples x m (-Mi), x m (M2) to said sinusoidal 
estimation unit (140). 

8. The parametric encoder according to claim 7, characterized in that the bi- 

lateral warping unit (126) carries out the transformation of the samples x° m into the samples 
x m according to: 
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wherein q columnwise represents the impulse responses of the tapped line of all-pass filters 
(122_1 ... 122JL-1). 

20 9. Method for encoding an audio or speech signal s into sinusoidal code data, 

comprising the steps of: 

segmenting said signal s into at least one single scale segment x m (n) with 
m=l ...M having the samples x m (0), x m (L-l); and 

estimating the sinusoidal code data representing said segment x m (n) from the 
25 received samples x m (0), 
x m (L-l)); 

characterized in that 
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a frequency-warping operation is carried out such that the samples x m (0), 
x m (L-l) are provided on a frequency-warped domain; and 

said sinusoidal data being estimated on the frequency- warped domain are re- 
mapped to the original frequency domain of the signal s. 
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The invention relates to a parametric encoder for encoding an audio or speech 
signal into sinusoidal code data. Such parametric encoders typically comprise a segmentation 
unit 120 for segmenting said signal s into at least one single scale segment x m (n) with m = 1 
... M and for outputting the samples x m (0), ...,x m (L-l) of said segment x m (n) and comprise a 
5 sinusoidal estimation unit 140 for estimating the sinusoidal code data representing said 
segment x m (n) from said samples. It is the object of the invention to improve a parametric 
encoder and method such that the achievement of a required time-frequency resolution trade- 
off is facilitated. This is achieved by embodying the segmentation unit 120 such that it carries 
out a frequency- warping operation in order to transform the output samples x m (0), x m (L-l) 
10 onto a frequency- warp domain and by providing a post-processing filter 160 for re-mapping 
the sinusoidal code data output by the sinusoidal estimation unit 140 to the original frequency 
domain of the signal s. 
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